Add Windows PTY support for executables and containers (HMP v1 over UDS)#133
Add Windows PTY support for executables and containers (HMP v1 over UDS)#133mitchdenny wants to merge 5 commits intomainfrom
Conversation
Phase 3 of Aspire WithTerminal end-to-end. Adds DCP-side terminal support for Executable resources on Windows via ConPTY. The PTY-attached process is reachable from the Aspire terminal host through a Hex1b Multiplex Protocol v1 server on a per-replica Unix domain socket. API: - api/v1/terminal_types.go: TerminalSpec (Enabled, UDSPath, Cols, Rows) - api/v1/executable_types.go: ExecutableSpec.Terminal *TerminalSpec Implementation: - internal/hmp1/: HMP v1 server (Hello/StateSync/Output/Input/Resize/Exit) - internal/exerunners/pty_windows.go: ConPTY wrapper via UserExistsError/conpty - internal/exerunners/pty_other.go: non-Windows stub returning ErrTerminalNotSupported - internal/exerunners/terminal_session.go: per-executable UDS listener + accept loop, single-viewer model with graceful shutdown that flushes the Exit frame - internal/exerunners/process_executable_runner.go: branches on Terminal.Enabled in StartRun and tears the session down in StopRun Tests: - internal/hmp1/server_test.go: 6 unit tests covering Hello/StateSync, output forwarding, input/resize delivery, Exit on PTY exit, oversize-frame rejection - internal/exerunners/terminal_session_windows_test.go: end-to-end smoke test that spawns cmd.exe under ConPTY, dials the UDS as an HMP v1 client, and asserts the full Hello -> StateSync -> Output -> Exit frame sequence with the literal echoed text and exit code 0
Refactors the Windows PTY/HMP v1 plumbing introduced in PR #133 from internal/exerunners into a shared internal/termpty package, then layers container terminal support on top using the same primitives. Refactor (no behavior change for executables): - New internal/termpty package with Process, Session, CommandSpec, SessionConfig and BuildWindowsCommandLine. - internal/exerunners now consumes termpty.StartProcess and termpty.StartSession; the executable terminal session windows test moves into termpty/session_windows_test.go. Container PTY: - ContainerSpec gains a Terminal *TerminalSpec field (mirrors the shape on ExecutableSpec). - Docker and Podman applyCreateContainerOptions append "-t -i" to `docker create` / `podman create` when Terminal is enabled, so the container runtime allocates a real PTY for the container's PID 1. - New controllers/container_terminal.go::startContainerTerminalSession spawns the orchestrator's CLI as `<runtime> start --attach --interactive <id>` under a host ConPTY, exposing the resulting byte stream as an HMP v1 producer at TerminalSpec.UDSPath. - runningContainerData tracks the live *termpty.Session; the new closeTerminalSession helper is invoked from deleteContainer so the attach process is reaped during teardown. - ContainerReconciler.ensureContainerTerminalSession invokes the helper after a successful start (both the simple-network and custom-network code paths). Terminal attach errors are non-fatal: the container keeps running, just without an attachable terminal. The terminal attach process exiting (because the container exited or was removed) signals container exit through the same Session.Done() channel that the executable runner already uses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Container support has been split out and stacked on top of this PR: #138 (DCP side) + microsoft/aspire#16762 (Aspire side). |
docker container start <id> is documented to start STOPPED containers; on
a running container, '--attach --interactive' is a no-op and the host PTY
never wires up to PID 1's stdin/stdout. Because the container reconciler
already starts the container detached (via StartContainers) before
ensureContainerTerminalSession runs, the second 'start' call did nothing
and the host saw no terminal traffic.
Switch to 'attach' on the running container with:
--sig-proxy=false - keep host-process Ctrl-C from killing the
container; signals are delivered in-band via
HMP v1 input bytes (e.g. 0x03).
--detach-keys= - disable the default Ctrl-P,Ctrl-Q detach so
those keystrokes pass through as raw bytes.
Container creation already opens TTY+stdin via -t -i (in
applyCreateContainerOptions), so 'attach' delivers a usable
bidirectional PTY stream.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously DCP listened on the producer UDS and the Aspire terminal host dialed in. That ordering meant the terminal host could only attach AFTER DCP had already started the PTY-attached process AND opened its listener. Anything the underlying shell/process printed in that window (notably bash's first PS1 prompt) was lost, because docker attach (and a fresh PTY read pump in general) does not replay buffered TTY output. This commit reverses just the TCP role on the producer link: * Terminal host listens on the producer UDS (it owns the file now). * DCP dials in via dialUDSWithRetry (50 x 100ms) which absorbs the small window between the host process being marked running and its listener actually being bound. The resource-model WaitAnnotation already sequences the host before any resource that uses it. * The HMP v1 protocol roles are unchanged: DCP still serves the protocol (it owns the PTY data source); the host still consumes it. Only the underlying TCP direction flipped. Session lifecycle simplifications that fall out of the flip: * No listener field, no acceptLoop, no per-session multi-viewer slot. Exactly one connection per session. * DCP no longer owns the UDS file; os.Remove calls in StartSession, watchExit, and Close are gone. * The connection is established eagerly in StartSession, so a dial failure surfaces immediately to the caller. Why this fixes the missed-prompt symptom: with the new direction the terminal host's listener exists before DCP attempts to dial. Once DCP dials, hmp1.Serve immediately starts pumping PTY output. The terminal host's headless presentation captures that output into its own scrollback, and any dashboard/CLI viewer that connects later gets a StateSync replay from the host - even if their attach happens many seconds after the shell's first prompt. The Windows test now plays the listening side: net.Listen first, then StartSession (which dials), then Accept. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously Session.Close() and watchExit() both raced to close the same ConPTY handles (PseudoConsole, pi.Process, pi.Thread, pty I/O pipes) while cp.Wait() was still polling WaitForSingleObject(pi.Process) in a 1-second loop. The double-close pattern is dangerous on Windows: when a HANDLE value is closed and the kernel recycles it for an unrelated object in the same process, a second CloseHandle on the original numeric value closes the unrelated object. If the recycled handle happens to be DCP's process cleanup job (JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE in pkg/process/os_executor_windows.go), Windows immediately terminates every process assigned to that job. In the Aspire WithTerminal scenario this means clicking Stop on a single PTY-attached resource cascade-kills the dashboard, every other terminal-host replica, and every sibling executable/container managed by the same DCP — while DCP itself stays alive because its own handle table is fine. Refactor watchExit() to be the *sole* owner of session teardown: * The PTY wait is isolated to a private inner goroutine that is the only code that ever observes pi.Process. It exits when the process exits naturally OR when watchExit closes the PTY. * watchExit selects on (waitDone, stopCh): natural exit takes the first branch; explicit Close() takes the second by closing stopCh. * In both branches watchExit calls PTY.Close exactly once, drains the inner wait goroutine, gives the in-flight HMP v1 handler a bounded window to flush its Exit frame, and finally closes the connection. * Close() no longer touches any conpty handles or the connection. It just signals teardown via stopCh (guarded by sync.Once) and blocks on doneCh with a 5s safety timeout. This eliminates the double-close pattern entirely. PTY handles are closed exactly once and only after the wait goroutine that owns them has been drained. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix: cascade kill on Stop of WithTerminal-attached resourcePushed 38f88c9 to address a Windows-specific bug discovered during end-to-end testing of WithTerminal in the Aspire repo. SymptomClicking Stop in the Aspire dashboard on a resource started under .WithTerminal(...) would kill not just that resource, but every other DCP-managed child in the same run: the Aspire dashboard itself, all �spire.terminalhost replicas, and every sibling executable/container — leaving only DCP, the AppHost, and the CLI alive. Stopping a non-PTY resource never reproduced this. A 10-second heartbeat in the dashboard process would stop emitting between consecutive ticks (i.e. process death between heartbeats), with no managed exception, no ProcessExit, no UnobservedTaskException. Root cause\Session.Close()\ and \watchExit()\ both raced to close the same conpty handles (\PseudoConsole, \pi.Process, \pi.Thread, the four PTY pipe handles) while \cp.Wait()\ was concurrently polling \WaitForSingleObject(pi.Process)\ in a 1-second loop (\internal/termpty/session.go:185-212\ and \pkg/mod/github.com/UserExistsError/conpty@v0.1.4/conpty.go:252-261). The double-close pattern is dangerous on Windows. When a HANDLE value is closed and the kernel recycles it for an unrelated kernel object in the same process, a second \CloseHandle\ on the original numeric value closes the unrelated object. If that recycled handle happens to be DCP's process cleanup job handle (created in \pkg/process/os_executor_windows.go:209-238\ with \JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE), the kernel immediately terminates every process assigned to that job — which is exactly what we saw. FixRefactor \watchExit()\ to be the sole owner of session teardown:
Tests: \go test ./internal/termpty/... ./internal/exerunners/... ./internal/hmp1/... ./controllers/...\ all green; \go vet ./...\ clean. Will validate against the live Aspire repro next. |
karolz-ms
left a comment
There was a problem hiding this comment.
Notes for myself to check/clean things up
| // capture. Terminal mode requires Windows on the host (ConPTY) for the | ||
| // initial slice; non-Windows hosts return ErrTerminalNotSupported at | ||
| // startup. | ||
| Terminal *TerminalSpec `json:"terminal,omitempty"` |
There was a problem hiding this comment.
This needs to be included in the validation (for both initial object creation and object update) and in the lifecycle key calculation.
| // | ||
| // When Enabled is true, DCP allocates a PTY for the underlying process and | ||
| // listens on UDSPath (a Unix Domain Socket path on Linux/macOS, or a named pipe | ||
| // path on Windows in a follow-up). When the terminal host opens an HMP v1 |
There was a problem hiding this comment.
Why not Unix domain sockets on Windows? We use them for pumping DCP logs into Aspire dashboard already...
| // - Process exit -> HMP v1 Exit frame, then close | ||
| // | ||
| // The HMP v1 wire format is defined by the Aspire dashboard's terminal host | ||
| // (see Hex1b's Hmp1Protocol). DCP's responsibility is limited to PTY |
There was a problem hiding this comment.
Add a URL to the HMP spec here.
| type TerminalSpec struct { | ||
| // Enabled controls whether DCP allocates a PTY for the process and exposes | ||
| // an HMP v1 producer endpoint at UDSPath. | ||
| Enabled bool `json:"enabled,omitempty"` |
There was a problem hiding this comment.
Why do we need this flag? What does it mean to have TerminalSpec present in an Executable or Controller spec (*Terminal is not nil) but Enabled is false? Is this a valid spec and if so, what is the semantics?
|
|
||
| // UDSPath is the Unix Domain Socket path that DCP listens on for the | ||
| // terminal host's HMP v1 client connection. Required when Enabled is true. | ||
| UDSPath string `json:"udsPath,omitempty"` |
There was a problem hiding this comment.
Do we want to support changing UDSPath for existing Container or Executable? If yes, we should handle it in the controller(s). If not we should prevent that change via validation.
| // uses. Empty strings, strings containing whitespace, or strings containing | ||
| // quotes get wrapped in double quotes; backslashes preceding a quote are | ||
| // doubled. | ||
| func quoteWindowsArg(arg string) string { |
There was a problem hiding this comment.
There is Windows API to do this
| // CommandLine is the full command line in CreateProcessW form (a single | ||
| // string with arguments embedded and quoted as appropriate). Callers can | ||
| // build this via BuildWindowsCommandLine. | ||
| CommandLine string |
There was a problem hiding this comment.
Probably not... should prefer arg array
|
|
||
| // Env is the environment block passed to the child process (each entry | ||
| // is "KEY=VALUE"). May be nil to inherit the parent's environment. | ||
| Env []string |
| every = 100 * time.Millisecond | ||
| ) | ||
| var lastErr error | ||
| for i := 0; i < attempts; i++ { |
There was a problem hiding this comment.
We have reliability package for retries
|
|
||
| tp *Process | ||
|
|
||
| mu sync.Mutex |
There was a problem hiding this comment.
There is a lot of synchronization primitives used here. Need to dig into this component and simplify it.
Implements PTY support for DCP
Executableresources on Windows via ConPTY, exposing the PTY to consumers (Aspire terminal host) over a per-executable Unix domain socket using HMP v1 (the Hex1b Multiplex Protocol).This is the DCP-side slice of aspire/16317 ΓÇöWithTerminal(...), tracked in dcp/6.## Scope| Slice | This PR | Follow-up ||---|---|---|| Windows executables (ConPTY) | Yes | ΓÇö || Linux/macOS executables (creack/pty) | No | Yes || Containers (docker/podman--tty) | No | Yes |The Aspire side (WithTerminal()API + terminal host that consumes HMP v1) is being prepared in parallel and will land against this DCP version.## APIapi/v1/terminal_types.goaddsTerminalSpec:gotype TerminalSpec struct { Enabled bool UDSPath string Cols int32 Rows int32}````api/v1/executable_types.go` adds `ExecutableSpec.Terminal *TerminalSpec` (nil = current log-streaming behavior, set = PTY-attached + HMP v1 server).## Implementation- `internal/hmp1/server.go` ΓÇö HMP v1 protocol server: `[type:1B][length:4B LE][payload:N]` framing, frame types Hello/StateSync/Output/Input/Resize/Exit. Bidirectional pumps with graceful teardown via context-cancellation read-deadline nudge.- `internal/exerunners/pty_windows.go` ΓÇö wraps the `UserExistsError/conpty` package as `hmp1.PTY`. Includes Windows argv-quoting for `CreateProcessW`.- `internal/exerunners/pty_other.go` ΓÇö non-Windows stub returning `ErrTerminalNotSupported`.- `internal/exerunners/terminal_session.go` ΓÇö per-executable UDS listener + accept loop. Single-viewer model (next connection bumps the previous). Graceful shutdown on natural process exit waits for in-flight handlers to flush their `Exit` frame before closing the listener.- `internal/exerunners/process_executable_runner.go` ΓÇö branches on `exe.Spec.Terminal.Enabled` in `StartRun` and tears the session down in `StopRun`. Real exit code propagates through `OnRunCompleted`.## Tests- 6 HMP v1 protocol unit tests (`internal/hmp1/server_test.go`) ΓÇö Hello/StateSync emission, output forwarding, input/resize delivery, Exit on PTY exit, oversize-frame rejection.- End-to-end smoke test (`internal/exerunners/terminal_session_windows_test.go`, build-tagged `windows`) ΓÇö spawns `cmd.exe /c "echo hello-world-from-conpty && exit 0"` under ConPTY at 100x30, dials the UDS as an HMP v1 client, asserts the full `Hello -> StateSync -> Output -> Exit(code=0)` sequence with the literal echoed text in the captured Output stream.## Validation# Windowsgo build ./...go test ./internal/hmp1 ./internal/exerunners -count 3```(Run with-count 3deliberately ΓÇö the HMP v1 server has a few timing-sensitive paths (deadline nudges, graceful shutdown) that single-pass runs would not reliably catch.)## Not in this PR- Containers (`Container.Spec.Terminal`).- Linux/macOS executables.- Aspire-side `WithTerminal()` plumbing.- DCP version bump in Aspire.Companion PR
Aspire side: microsoft/aspire#16760 — WithTerminal(): per-replica interactive terminal sessions (Aspire side, draft)